HTML Basics: Terminology
Introduction
Tags, content and presentation
In its most basic form, an HTML document consists of text, enclosed in
tags. These tags (more accurately, these elements) describe the
meaning of the text they contain, rather than how the enclosed text should
be displayed. This concept is called content-based markup, as
opposed to presentational markup.
Content-based markup allows device
independence; knowing the meaning of a piece of text allows a browser to
render it as good as possible on the platform it is running on. With
presentational markup this is impossible. Without knowing why a
string of text must be displayed in red 20 points Helvetica, you can't
pick a good alternative way to display it on a screen where this font isn't
available.
Using tags
An element, when used in a document, consists of an
opening and a closing tag. The closing tag is not always used. It might
be optional, or even forbidden. The group of elemens which have opening
and closing tags are referred to as container elements, and the
group of elements without closing tag as empty elements.
Container tags may not overlap each other. Always close the innermost
container first, if you are nesting them.
An opening tag can have certain attributes. These provide extra
information about the tag and the text they enclose, if any. For
example, the A tag has an HREF attribute which defines where the
anchored text is a link to.
The attribute may have a value, although this is not necessary in all
cases. If it has a value, it is specified in the "name=value
"
form. The value must be enclosed in quotes if it contains anything more
than letters, digits, hyphens and/or periods. In all other cases, quoting
is optional. The maximum length for an attribute value is 1024 characters,
including the quotation marks (if used).
The generic structure
The document can be divided in two parts, the head and the
body. The document head provides information about the document, for
example its title, the author and a short description
(there's a separate section on using the
document head in the HTML Basics series).
The document body holds the actual contents of the document.
The document body is built up with so-called block elements or block-level
tags. A block element marks up a section of text
and assigns it a
particular meaning. For example, you can indicate that a section of text
is a heading, a large quotation or an item in a list. There are also block
elements which may only contain other block elements and no text. These
elements include lists (which may only contain list items) and tables
(which may only contain table rows full of cells). Some block elements may
contain other block elements, instead of only text. These are sometimes
referred to as super-block elements.
Block elements which may not contain text are used to hold certain
block elements together, so they form a logical unity. A list is a good
example of this; it groups all the list items inside together, so the
browser knows the items are part of the same list. A slightly more
complex example is the table. An HTML table is built up by rows of
cells, and the table tag itself contains an optional caption, followed
by one or more rows. The rows may only contain header or data cells,
and the cells themselves may contain almost every element.
Special cases
A super-block element assigns a meaning to a set of block
elements. The division tag, DIV, is probably the best example. It can
be used to set a default alignment or style attributes for all the
block elements it contains. This is easier to do than setting that
property for each block element inside.
A special case is the preformatted text container. It is the only
container in which linebreaks and spacing is used exactly as how it
appears inside the source. This is very useful if you are inserting
ASCII art, or text which requires a specific layout and spacing, for
example the source for a program.
Inside the block elements, the actual text is found. This text should be
written only with characters in the ISO Latin 1
character set. In HTML, spaces and newlines are considered identical.
They are referred to as whitespace, and if multiple whitespace
elements are used in sequence, the browser should display only one
whitespace element.
Depending on the block, the text inside it may also be marked up. In
general, the text-level tags used for this can be divided into
three categories:
- Appearance (font tags), which change the appearance of the text.
- Logical (phrase tags) which assign text a particular meaning.
- Special tags, which assign text a particular functionality.
Appearance/font tags
Font tags are used to change the appearance of the text. This
includes font size changes, boldface, italics and super/subscript.
However, if a browser can't perform the appearance change, it has no good
way to determine a good alternative. As said above, without knowing
why this font change should be performed, the browser can't
pick another way to display/process the text. A search engine can't know
something in italics is a book title unless you tell it.
This limitation can cause problems if your document depends on this
appearance change. There is no guarantee or requirement that a browser
will display a font tag in the way the name suggests.
Logical markup
It's not always necessary to use a font tag. Often the change in
appearance is an attempt to assign a special meaning to the text. For
example, italics is often used for citations or emphasized text. In these
cases, a better approach is to use a logical tag to indicate this
meaning. The browser can now pick the best way to display that kind of
text on the screen.
For example, if the browser does not support italics,
it can still display citations and emphasized text correctly, although
probably in a different fashion.
Special markup
The third category, special tags, does assign meaning or
appearance change to text, but functionality instead. The most
common example is the hyperlink, which assigns a connection to another
document to the enclosed text. Inline images also fall in this category.
Strangely enough, the Wilbur specification also include the FONT tag in
this group, although it is clearly an appearance tag.
The three building blocks for HTML forms (INPUT, TEXTAREA and
SELECT)
are also text-level tags, and can be grouped in the "Special" category.
A final note
In almost all cases, you can use each text-level tag inside another one,
even when this doesn't make sense. There is no way to prevent this in the
specification, so it's up to the author to use only meaningful constructs.
If a meaningless construct is used (such as, for example,
<EM><INPUT TYPE=radio NAME=foo></EM>
), you
can get unexpected results if a browser tries to render it.
Reference index ~
HTML Basics index ~
Feedback
Copyright © 1996
Arnoud "Galactus"
Engelfriet.